Unsupervised Abbreviation Detection in Clinical Narratives

نویسندگان

  • Markus Kreuzthaler
  • Michel Oleynik
  • Alexander Avian
  • Stefan Schulz
چکیده

Clinical narratives in electronic health record systems are a rich resource of patient-based information. They constitute an ongoing challenge for natural language processing, due to their high compactness and abundance of short forms. German medical texts exhibit numerous ad-hoc abbreviations that terminate with a period character. The disambiguation of period characters is therefore an important task for sentence and abbreviation detection. This task is addressed by a combination of co-occurrence information of word types with trailing period characters, a large domain dictionary, and a simple rule engine, thus merging statistical and dictionary-based disambiguation strategies. An F-measure of 0.95 could be reached by using the unsupervised approach presented in this paper. The results are promising for a domain-independent abbreviation detection strategy, because our approach avoids retraining of models or use case specific feature engineering efforts required for supervised machine learning approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of sentence boundaries and abbreviations in clinical narratives

BACKGROUND In Western languages the period character is highly ambiguous, due to its double role as sentence delimiter and abbreviation marker. This is particularly relevant in clinical free-texts characterized by numerous anomalies in spelling, punctuation, vocabulary and with a high frequency of short forms. METHODS The problem is addressed by two binary classifiers for abbreviation and sen...

متن کامل

Disambiguation of Period Characters in Clinical Narratives

The period character’s meaning is highly ambiguous due to the frequency of abbreviations that require to be followed by a period. We have developed a hybrid method for period character disambiguation and the identification of abbreviations, combining rules that explore regularities in the right context of the period with lexicon-based, statistical methods which scrutinize the preceding token. T...

متن کامل

BotOnus: an online unsupervised method for Botnet detection

Botnets are recognized as one of the most dangerous threats to the Internet infrastructure. They are used for malicious activities such as launching distributed denial of service attacks, sending spam, and leaking personal information. Existing botnet detection methods produce a number of good ideas, but they are far from complete yet, since most of them cannot detect botnets in an early stage ...

متن کامل

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

Combining of Magnitude and Direction of Change Indices to Unsupervised Change Detection in Multitemporal Multispectral Remote Sensing Images

In remote sensing, image-based change detection techniques, analyze two images acquired over the same area at different times t1 and t2 to identify the changes occurred on the Earth's surface. Change detection approaches are mainly categorized as supervised and unsupervised. Generating the change index is a key step for change detection in multi-temporal remote sensing images. Unsupervised chan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016